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Field of the Invention . 

[0001] This invention relates to the field of molecular biology, and in 
particular to the development and use of vectors for the expression of heterologous 
genetic sequences in transformed ceUs. 

Description of the Related Art . 

[0002] Typical expression vectors contain promotes to drive the gene of 
interest as well as polyadenylation signals to generate a mature transcript. Promoter 
sequences tend to be only a few hundred base pairs in length and contain most, if not 
all, of the regulatory regions for optimal expression as determined by transient 
transfection. However, expression constructs containing these sequences, although 
highly functional in transient transfections, are not always able to confer a similar level 
of expression when integrated into the chromatin as a stable transfectant. This is due to 
position-dependent expression, a phenomenon in which the site of integration has a 
dominant effect, usually negative, on the level of expression (Wilson (1990), Attn. Rev. 
Cell Biol. 6:679-714). Tlie result of position-dependent expression is evident in the 
results of a transfection screening, in which most of the cell lines produce little or no 
product. Therefore, it is usually necessary to screen a large number of transfectants in 
order to identify a single high-expressing clone. Even after extensive screening, 
transfectants obtained using standard expression vectors typically have expression 
levels that would not be sufficient to meet commercial titer goals. 

[0003] The time consimaing and labor intensive process of DHFR 
amplification is fi^quently employed to increase e7q)ression levels in stable 
transfectants. For example, integrated copies of standard expression constructs 
typically require amplification to greater than 100 copies in order to approach the level 
of expression of endogaious genes with promoters of similar strength (from only two 
alleles). The differences between standard expression vectors and endogenous genes 
are most Ukely due to the presence of sequences 5' to the promoter and/or 3' to the 
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are most likely due to the presence of sequences 5' to the promoter and/or 3* to the 
polyadenylation signal of the endogenous genes that are able to confer a chromatin 
configuration more favourable for expression. An expression construct containing 
sequences that can confer favourable position-independent chromatin configurations, 
5 regardless of the integration site(s) would be advantageous for generating cell Unes 
highly expressing heterologous genes. 



SUMMARY OF THE INVENTION 
[0004] The present invention depends, in part, upon the development of high 

1 0 expression "locus vectors" derived firom the ferritin heavy chain gene. The concept of a 
"locus vector" is based on the observation that the regions found 5' and 3' to highly 
expressed genes in their natural chromatin contexts can confer higlier levels of 
expression to a heterologous gene. Therefore, the present invention provides ferritin 
heavy chain gene locus vectors which include 5' and 3' sequences which can convey 

15 high levels of expression to heterologous genes in stable transfectants. Thus, the 
invention provides genetic vectors for the stable transfection and expression at high 
levels of a desired protein within eukaryotic cells. 

[0005] In one aspect, the invention provides genetic vectors for stable 
transfection and expression of a desired protein within eukaryotic cells including: 

20 (a) distal 5* flanking sequences of a eukaryotic locus; (b) proximal 5* regulatory 
sequences of a eukaryotic locus; (c) at least a first insertion site for a heterologous 
sequence; and (d) proximal 3' regulatory sequences effective for transcription 
termination of a eukaryotic locus; in which these sequences are oper ably joined in the 
order (a)-(d) in a 5' to 3* orientation, with optional linker sequences between adjacent 

25 sequences; and in which (1) the distal 5* flanking sequences comprise a sequence of at 
least 100 bases having at least 70% identity to a nucleotide sequence found between 20 
bp and 100,000 bp 5' of a transcriptional initiation site of a ferritin heavy chain locus; 
and/or (2) the proximal 5' regulatory sequences comprise a sequence of at least 20 
bases having at least 70% identity to a nucleotide sequence foimd between 1 bp and 

30 10,000 bp 5' of a translational initiation codon of a ferritin heavy chain locus. 

[0006] In another aspect, the vector includes at least a fnst heterologous 
coding sequence encoding a desired protein. Thus, the invention provides genetic 
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vectors for stable transfection and expression of a desired protein within eukaryotic 
cells including: (a) distal 5' flanking sequences of a eukaryotic locus; (b) proximal 5' 
regulatory sequences of a eukaryotic locus; (c) at least a first heterologous coding 
sequence encoding said desired protein; and (d) proximal 3' regulatory sequences 
5 effective for transcription termination of a eukaryotic locus; in which these sequences 
are operably joined in the order (a)-(d) in a 5* to 3' orientation, with optional linlcer 
sequences between adjacent sequences; and in which (1) the distal 5' flanking 
sequences comprise a sequence of at least 100 bases having at least 70% identity to a 
nucleotide sequence found between 20 bp and 100,000 bp 5' of a transcriptional 

1 0 initiation site of a ferritin heavy chain locus; and/or (2) the proximal 5' regulatory 
sequences comprise a sequence of at least 20 bases having at least 70% identity to a 
nucleotide sequence found between 1 bp and 10,000 bp 5' of a translational initiation 
codon of a ferritin heavy chain locus. 

[0007] In some embodiments, the distal 5' flanking sequences are derived 

15 from a ferritin heav>^ chain locus. In other embodiments, the proximal 5* regulatory 
sequences are derived from a ferritin heavy chain locus. In yet other embodiments, 
both the proximal 5' regulatory sequences and the distal 5' flanking sequences are 
derived from a ferritin heavy chain locus. 

[0008] In some embodiments, the proximal 3' regulatory sequences are 

20 derived from a ferritin heavy chain locus, and in some embodiments the vector further 
includes distal 3' flanking sequences of a ferritin heavy chain locus. 

[0009] In certain embodiments of the invention, the insertion site for a 
heterologous sequence includes at least one restriction endonuclease site, and in other 
embodiments the insertion site for a heterologous sequence is a polylinker site 

25 including at least two restriction endonuclease sites. 

[0010] In certain embodiments of the invention, the proximal 5* regulatory 
sequences include a eukaryotic intron sequence. In some of these embodiments, the 
eukaryotic intron sequence is derived from intron 1 of a ferritin heavy chain gene. In 
certain embodiments, the proximal 5' regulatory sequences include untranslated exon 

30 sequences. 

[0011] In some embodiments, the distal 5' fLanking sequences and the 
proximal 5' regulatory sequences have a total length of between 1,000 and 10,000 
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bases. Similarly, in some embodiments, the proximal 3' regulatory sequences and any 
distal 3* flanking sequences have a total length of between 1,000 and 10,000 bases. 

[0012] In another aspect, the invention provides eukaryotic cells transfected 
with any of the vectors of the invention. In some embodiments, the vector has stably 
5 integrated into a chromosome of said cell and, in some embodiments, the first 
heterologous coding sequence is expressed in said cell. 

[0013] In some embodiments, the invention provides eukaryotic cells 
including: (a) distal 5* flanking sequences of a eukaryotic locus; (b) proximal 5' 
regulatory sequences of a eukaryotic locus; (c) at least a first coding sequence; and 
10 (d) proximal 3' regulatory isequences eflFective for transcription termination of a 

eukaryotic locus; in which the sequences are operably joined in order (a)-(d) in a 5' to 3* 
orientation, with optional linker sequences between adjacent sequences; and in which 

(1) the distal 5' flanking sequences comprise an exogenous sequence of at least 100 
bases having at least 70% identity to a nucleotide sequence found between 20 bp and 

15 100,000 bp 5' of a transcriptional initiation site of a ferritin heavy chain locus; and/or 

(2) the proximal 5' regulatory sequences comprise an exogenous sequence of at least 20 
bases having at least 70% identity to a nucleotide sequence found between 1 bp and 
10,000 bp 5' of a translational initiation codon of a ferritin heavy chain locus. 

[0014] In another aspect, the invention provides a eukaryotic cell including an 
20 exogenous 5* distal flanking sequence derived firom a ferritin heavy chain locus 
operably joined to a coding sequence. 

[0015] In another aspect, the invention provides a method of producing a 
desired protein in a eukaryotic cell including the steps of (a) providing at least one cell 
of the invention or a descendent thereof; (b) maintaining the cell in a culture under 
25 conditions which permit high expression of the desired protein; and (c) isolating the 
desired protein from the culture. 

[0016] These and other aspects and advantages of the invention will be 
apparent to those of skill in the art from the detailed description and examples which 
follow. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0017] The following drawings are illustrative of embodiments of the 
invention and are not meant to limit the scope of the invention as encompassed by the 
claims. 

5 [0018] Figure 1 shows rat ferritin heavy chain exon sequences. 

[0019] Figure 2 illustrates one example of the subcloning of the region 
containing the fenitin heavy chain exons into the Litmus 38 plasmid. 

[0020] Figure 3 illustrates the deletion of exons 2, 3, and 4 from pFerXl and 
insertion of a polylinker to generate plasmid pFerX2. 
10 [0021] Figure 4 illustrates the deletion of the exon 1 coding region from 

pFerX2 to generate plasmid pFerX3, and deletion of the IRE to generate plasmid 
pFerX4. 

[0022] Figure 5 A-B illustrates the removal of exons 2 through 4 of the 
ferritin heavy chain gene from cosmid 15A using PGR fiision. 
15 [0023] Figure 6 illustrates the insertion of the PGR fusion product of Figure 5 

into the Hpal and Aatn sites of pFerX4 to generate plasmid pFerXS. 

[0024] Figure 7 illustrates the removal of tlie Swal site from pFerXS to 
generate plasmid pFerXS . 1 . 

[0025] Figure 8 illustrates the addition of the distal 3' flanking sequences to 
20 pFerX6 to generate pFerX7. 

[0026] Figure 9 illustrates the addition of the distal 5' flanking sequences of 
the ferritin heavy chain gene to pFerX? to generate plasmid pFerXS. 

[0027] Figure 10 illustrates the genetic map of plasmid pFerXS, including the 
sources of the sequences. 
25 [0028] Figure 1 1 illustrates the genetic map of plasmid pFerX9, including the 

sources of the sequences. 

[0029] Figure 12 illustrates the sequence of the transcribed region of the 
pFerXS and pFerX9 plasmids. 

[0030] Figure 13 illusti'ates the genetic map of pSIDHFR,2, a DHFR 
30 expression plasmid. 

[0031] Figure 14 shows the results of experiments measuring reporter gene 
expression in pools of transfectants. 
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[0032] Figure 1 5 shows the results of experiments measuring reporter gene 
expression in transfected isolates. 

DETAILED DESCRIPTION 
5 [0033] Tlae patent, scientific and medical publications referred to herein 

establish knowledge that was available to those of ordinary skill in the art at the time 
the invention was made. The entire disclosures of the issued U.S. patents, published 
and pending patent applications, and other references cited herein are hereby 
incorporated by reference. 

10 Definitions > 

[0034] All technical and scientific terms used herem, xmless otherwise defined 
below, are intended to have the same meaning as commonly understood by one of 
ordinary skill in tlie art; references to techniques employed herein are intended to refer 
to the techniques as commonly understood in the art, including variations on those 

15 techniques or substitutions of equivalent techniques which would be apparent to one of 
skill in the art. . In order to more clearly and concisely describe the subject matter which 
is the invention, the following definitions are provided for certain terms which are used 
in tihte specification and appended claims. 

[0035] Eukarvotic Locus . As used herein, the term "eukaryotic locus" refers 

20 to any chromosomal genetic locus of a eukaryotic cell which encodes a polypeptide or 
RNA product which can be expressed in the cell under appropriate conditions. 
Mitochondrial loci are expressly excluded from the scope of the term "eukaryotic 
locus" as used herein. 

[0036] Distal 5' Flanking Sequences . As used herein, the term "distal 5* 

25 flanking sequences" refers to flanking nucleotide sequences which are 5* of the 

proximal 5' regulatory sequences of a gene. Thus, although these sequences can have 
an effect on transcription rates because of their effects on chromatin stmcture, these 
sequences are generally 5* of the basic regulatory sequences (e.g., operators, promoters, 
ribosome-binding sites) and fiirther removed firom the transcriptional initiation site than 

30 the proximal 5' regulatory sequences. The size of the distal 5' flanking sequences can 
range between 100-100,000 bases. In certain embodiments, the distal 5' flanking 
sequences will include between 500-50,000 bases, 750-25,000 bases or 1,000-10,000 
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bases. The distal 5' flanking sequences can begin anywhere 5* of the proximal 5* 
regulatory sequences, and typically begin 20 bases, 50 bases, 75 bases, 100 bases, 500 
bases, 1,000 bases, 5,000 bases or 10,000 bases 5* of the transcription initiation site. 
Distal 5' flanking sequences can extend for substantial distances 5* of the promoter and 
transcriptional initiation sequences of a gene, and typically end 100,000 bases, 50,000 
bases, 25,000 bases or 10,000 bases 5' of the transcription initiation site. 

[0037] Proximal 5' Reeulatorv Sequences . As used herein, the term "proximal 
5* regulatory sequences" refers to nucleotide sequences which are located near the 5' 
end of a gene and which include the basic regulatory elements (i.e., the promoter and, if 
present, operator and ribosome binding sequences) necessary for transcription and 
translation. The size of the proximal 5' regulatory sequences can range between 20- 
10,000 bases. In certain embodiments, the proximal 5' regulatory sequences will 
include between 50-5,000 bases, 75-1,000 bases or 100-500 bases. In some 
embodiments, the 3' end of the proximal 5' regulatory sequences can be defined as 
immediately -5* of the translation initiation or "start" codon of the coding region. 
Alternatively, in some embodiments, the proximal 5' regulatory sequences can include 
sequences internal to the gene including intron sequences and, therefore, the 3' end of 
the proximal 5* regulatory sequences can extend to the intron sequences. Moreover, in 
some embodiments, the proximal 5' regulatory sequences can include some 5* coding 
sequences (e.g., the start codon and/or a short N-terminal sequence). Proximal 5' 
regulatory sequences extend 5' of the transcriptional initiation site, and can end 10,000 
bases, 5,000 bases, 1,000 bases, 500 bases, 100 bases, 75 bases, 50 bases or 20 bases 5* 
of the transcriptional initiation site. 

[0038] Proximal 3' Regulatory Sequences . As used herein, the term "proximal 
3' regulatory sequences" refers to nucleotide sequences which are located near the 3' 
end of a gene and which include the basic regulatory elements (i.e., the translational 
termination codon, polyadenylation signal and transcriptional terminator) necessary for 
proper mRNA processing and translation termination. The size of the proximal 3' 
regulatory sequences can range between 10-2,000 bases. In certain embodiments, the 
proximal 3' regulatory sequences will include between 25-1,000 bases, 50-750 bases or 
75-500 bases. The 5' end of the proximal 3' regulatory sequences can be defined by the 
translational termination or "stop" codon (i.e., TAG, TTA or TGA). Proximal 3' 
regulatory sequences extend 3' of the translational termination codon, and can end 
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2,000 bases, 1,000 bases, 750 bases or 500 bases 3* of the translational teraiination 



[0039] Distal 3' Flanking Sequences . As used herein, the term "distal 3' 
flanking sequences" refers to flanking nucleotide sequences which are 3' of the 
proximal 3' regulatory sequences of a gene. Thus, these sequences are 3' of the basic 
regulatory sequences (i.e., the stop codon, and polyadenylation signal) necessary for 
proper mRNA processing and translation termination, and are further removed from the 
transcriptional termination site than the proximal 3' regulatory sequences. The size of 
the distal 3* flanking sequences can range between 100-100,000 bases. In certain 
embodiments, the distal 3* flanking sequences will include between 500-50,000 bases, 
750-25,000 bases or 1,000-10,000 bases. The distal 3* flanking sequences can begin 
anywhere 3* of the proximal 3^ regulatory sequences, and typically begin 500 bases, 750 
bases, 1,000 bases or 2,000 bases 3' of the translation termination codon. Distal 3' 
flanking sequences can extend for substantial distances 3' of tlie transcriptional 
termination codon and polyadenylation sequences of a gene, and typically end 100,000 
bases, 50,000 bases, 25,000 bases or 10,000 bases 3* of the transcriptional temiination 
codon. 

[0040] Vector . As used herein, the term "vector" means any genetic 
construct, such as a plasmid, phage, transposon, cosmid, chromosome, virus, virion, 
etc., which is capable transferring nucleic acids between cells. Vectors may be capable 
of one or more of replication, expression, recombination, insertion or integration, but 
need not possess each of these capabilities. Thus, the term includes cloning and 
expression vectors. 

[0041] Transfection . As used herein, the temi "transfection" means the 
introduction into a cell or an organism of a vector that repHcates v^thin that cell or 
organism or that expresses a pol^'peptide sequence in that ceU or organism with or 
without integrating into the genome of that cell or organism. The term "transfection" is 
xised to embrace all of the various methods of introducing such vectors, including, but 
not limited to the methods referred to in the art as transfection, transformation, 
transduction, or gene transfer, and including techniques such as microinjection, DEAE- 
dextran-mediated endocytosis, calcium phosphate coprecipitation, electroporation, 
liposome-mediated transfection, ballistic injection, viral-mediated transfection, and the 
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like. Cells or organisms which have undergone transfection are referred to herein as 
"transfectants." 

[0042] Stable Transfection As used herein, the term "stable transfection" 
means transfection, as defined above, which results in integration of all or a part of the 
vector into the genome of the transfected cell or organism. Cells or organisms which 
have undergone stable transfection are referred to herein as "stable transfectants." 

l^^43] Qperablv Joined. As used herein, the term "operably joined" refers to 
a covalent and functional linkage of genetic regulatory elements and a genetic coding 
region which can cause the coding region to be transcribed into mRNA by an RNA 
polymerase which can bind to one or more of the regulatory elements. Thus, a 
regulatory region, including regulatory elements, is operably joined to a coding region 
when RNA polymerase is capable under pennissive conditions of binding to a promoter 
within the regulatory region and causing transcription of the coding region into mRNA. 
In this context, permissive conditions wovdd include standard intracellular conditions 
for constitutive promoters, standard conditions and the absence of a repressor or the 
presence of an inducer for repressible/inducible promoters, and appropriate in vitro 
conditions, as known in the art, for in vitro transcription systems. 

[0044] Heterologous. As used herein, the term "heterologous" means, with 
respect to two or more genetic sequences, that the genetic sequences are not operably 
joined in nature or do not naturally occur within the same genome in nature. For 
example, if a vector includes a coding region which is operably joined to one or more 
regulatory elements, these sequences are considered heterologous to each otiier if they 
are not operably joined in nature or they are not found in the same genome in nature. 

[0045] Nucleotide Positions. As used herein, all nucleotide positions are 
designated with respect to the strand of DNA which includes elements of the ferritin 
heavy chain gene region in the "sense" orientation. As will be apparent from the 
context, numerical nucleotide positions are either designated with respect to the 
position of the start codon of the ferritin heavy chain gene or with respect to the 
position within one of the sequences included in the Sequence Listing. In the former 
case, the adenosine or "A" of the start codon (ATG) is designated as position 1, with 
preceding positions being negatively numbered. In the latter case, the relevant SEQ ID 
NO will always be specified. Relative nucleotide positions will be described with 
reference to the conventional 5' and 3' directions on the sense strand. 
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[0046] Percentages of Nucleotide Sequence Identity . As used herein, the 
percentage of sequence identity between two nucleotide sequences are calculated based 
upon the number of residues which are identical between the aligned sequences divided 
by the number of nucleotides present in the smaller of the two sequences. Before 
5 calculation of the percentage identity, the sequences are aligned using the algorithm (or 
an equivalent algorithm) of the ClustalW program with default values, available 
through the European Bioinformatics Institute of the European Molecular Biology 
Laboratory (EMBL) (http://www.ebi.ac.uk/clustalw), and described in Higgins et al. 
(1994), "CLUSTAL W: Improving the sensitivity of progressive multiple sequence 

10 alignment tlirough sequence weighting, position-specific gap penalties and weight 
matrix choice," Nucleic Acids Res. 22:4673-4680. 

[0047] Derived From . As used herein, the term "derived from," when used in 
relation to the origin of a nucleotide sequence, means that the sequence has been or can 
be obtained or produced, directly or indirectly, from a reference sequence by making a 

15 limited number of insertions, deletions or substitutions in the reference sequence. 

Thus, for example, a sequence which is a subset of a reference sequence can be derived 
from the reference sequence by deleting flanking sequences. Similarly, a sequence can 
be derived from a reference sequence by a combination of insertions, deletions and/or 
substitutions of one or more nucleotides in a reference sequence. The nimiber of 

20 insertions, deletions and substitutions can be limited by a required percentage identity 
between the reference sequence and the derived sequence. 

[0048] Numerical Ranges . As used herein, the recitation of a numerical range 
for a variable is intended to convey that the invention may be practiced with the 
variable equal to any of the values within that range. Thus, for a variable which is 

25 inherently discrete, the variable can equal each integer value of the nimierical range, 
including the end-points of the range. Similarly, for a variable which is inherently 
continuous, tlie variable can equal each real value of the numerical range, including the 
end-points of the range. As an example, a variable which is described as having values 
between 0 and 2, can be 0, 1 or 2 for variables which are inherently discrete, and can be 

30 0.0, 0.1, 0.01, 0.001, or any other real value < 2 for variables which are inherently 
continuous. 
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[0049] Or. As used herein, unless specifically indicated otherwise, the 
conjunction "or" is used in the "inclusive" sense of "and/or" and not the "exclusive" 
sense of "either/or. " 

General Considerations . 

[0050] The present invention depends, in part, upon the development of a high 
expression "locus vector" derived from the ferritin heavy chain gene. The concept of a 
"locus vector" is based on the observation that the regions found 5' and 3' to highly 
expressed genes in their natural chromatin contexts can confer higher levels of 
expression to a heterologous gene. Therefore, the present invention provides a ferritin 
heavy chain gene locus vector v^hich includes 5* and 3' sequences which convey high 
levels of expression to heterologous genes in stable transfectants. Thus, the invention 
provides genetic vectors for the stable transfection and expression at high levels of a 
desired proteha within eukaryotic cells. 

The Ferritin Heavv Chain Gene . 

[0051] The rat and human genomes contain multiple processed pseudo genes 
of the ferritin heavy chain (Hentze et al. (1986), Proc. Natl Acad. Set USA 83:7226- 
72307), The rat ferritin gene consists of four exons (i.e., exons 1 through 4) separated 
by three mtrons (i.e., introns 1 through 3). GenBank Accession Nos. Ml 805 1 , Ml 8052 
and Ml 8053 disclose three gene segments which are shown in parts A, B, and C of 
Figure 1. Together these three segments cover the four exons of the rat ferritin heavy 
chain genomic sequence. Figure 1(A) shows 168 bp of 5' untranslated sequence, 
including the transcriptional initiation site at position -168, followed by exon 1 and the 
first 104 bp of the 5' end of intron 1. Exon 1 includes the start codon and encodes 38 
amino acids. Figure 1(B) shows the last 50 bp of the 3^ end of intron 1, followed by 
exon 2 and the first 35 bp of the 5* end of intron 2. Figure 1(C) shows the last 33 bp of 
the 3* end of intron 2, followed by exon 3, intron 3, exon 4 and 3' imtranslated 
sequence, including the stop codon and polyadenylation signal 132 bp after the 
termination codon. 

[0052] Because the insert sizes of cosmid Ubraries are quite large, they were 
chosen to obtain sufficient 5' and 3' flanking regions. In particular, rat cosmid library 
Catalog #RL1032m (BD Biosciences/Clontech, Palo Alto, CA) was selected. Other 
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libraries, however, also could have been used, or sequences could have been prepared 
synthetically. 

[0053] In order to avoid cloning processed pseudogenes when screening the 
cosmid Ubrary, intron sequences were chosen to serve as probe templates. These 
5 introns were cloned by PGR using rat genomic DNA (Catalog #6750-1, Clontech, Palo 
Alto, CA) as a template and primers based on related cDNA and genomic sequences 
from GenBank. Biotinylated probes were prepared using the introns as templates, and 
the cosmid library was screened with them. One ferritin heavy chain gene cosmid 
(15A) was isolated and mapped with restriction enzymes. The three segments of rat 
10 genomic sequence from GenBank served as a guide to locate the coding regions and to 
plan the production of the high expression locus vector. 

Production of Ferritin Heaw Chain Gene Hieh Expression Locus Vector . 

[0054] The production of a high expression locus vector of tiie invention can 
be accompUshed in many ways. For example, the sequences forming the vector can be 

15 obtained from a sLagle clone or from multiple clones. The sequences can be based 
entirely on tlie rat ferritin heavy chain gene, entirely on another mammalian ferritin 
heavy chain, or on multiple mammalian ferritin heavy chain genes. The sequences can 
be based on all naturally-derived sequences or a mixture of naturally-derived and 
synthetic sequences. In addition, the locus vector can be produced by first obtaining 

20 one or more large genomic fragments including all or part of the ferritm heavy chain 
gene region and then deleting or inactivating undesired sequences while inserting 
desired sequences, or can be produced by cloning or subcloning only the desired 
fragments of the ferritin heavy chain gene region and then combining these with other 
desired sequences. Similarly, mixtures of these approaches, employing cloning, 

25 subcloning, deletion, inactivation and insertion can be employed to arrive at the desired 
construct. The approach taken and the order of the various steps is irrelevant to the 
invention and is within the discretion of one skilled in the art. 

[0055] The high expression locus vectors of the invention include, in order 
from 5^ to 3*, (a) distal 5* flanldng sequences of a eukaryotic locus; (b) proximal 5* 

30 regulatory sequences of a eukaryotic locus; (c) at least a first insertion site for a 
heterologous sequence; and (d) proximal 3' regulatory sequences effective for 
tianscription termination of a eukaryotic locus. Optionally, linker sequences may be 
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present between segments (a)-(d). Furthermore, at least one of the distal 5' flaoking 
sequences and proximal 5' regulatory sequences has substantial identity with 
corresponding sequences of a ferritin heavy chain gene. In some embodiments, distal 3* 
flanking sequences are also included in the vector. 
5 [0056] One embodiment of a high expression locus vector of the invention, 

the pFerXS vector described below, is disclosed in GenBank Accession No. 
AY14793a 

A. Distal 5' Flanking Sequences and Proximal 5' Regulatory Sequences . 
[0057] In some embodiments, the distal 5* flanking sequences of the locus 

1 0 vector will include a sequence of 1 00-1 00,000 nucleotides having at least 70%- 1 00% 
identity to a nucleotide sequence found within the distal 5' flanking sequences of a 
sferritin heavy chain locus. Thus, in some embodiments, the distal 5' flanking sequences 
can include at least 100, 500, 750, 1,000, 10,000, 25,000, 50,000 or 100,000 
nucleotides having at least 70%, 75%, 80%, 85%, 90%, 95% or 100% identity to a 

15 nucleotide sequence found within the distal 5' flanking sequences of a ferritin heavy 
•chain locus. As shown in the examples below, the distal 5' sequences can include 
1,000-10,000 bp, 2,000-9,000 bp, 3,000-8,000 bp or 4,000-7,000 bp of flanking 
sequences. 

[0058] In other embodiments, the distal 5' flanking sequences of the locus 
20 vector will share lower percentages idmtity with the corresponding ferritin heavy chain 
gene sequences, and in some embodiments the distal 5' flanking sequences will be 
unrelated to any corresponding ferritin heavy chain gene sequences. 

[0059] Downstream from the distal 5' flanking sequences, the high expression 
locus vector of the invention includes proximal 5' regulatory sequences. In some 
25 embodiments, the proximal 5' regulatory sequences of the locus vector will include a 
sequence of at least 20-10,000 nucleotides having at least 70%-100% identity to a 
nucleotide sequence found within the proximal 5' regulatory sequences of a ferritin 
heavy chain locus. Thus, in some embodiments, the proximal 5' regulatory sequences 
can include at least 20, 50, 75, 100, 500, 1,000, 5,000 or 10,000 nucleotides having at 
30 least 70%, 75%, 80%, 85%, 90%>, 95%o or 100%> identity to a nucleotide sequence fomid 
within the proximal 5' regulatory sequences of a ferritin heavy chain locus. 
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[0060] In other embodiments, the proximal 5' regulatory sequences of the 
locus vector will share lower percentages identity with the corresponding ferritin heavy 
chain gene sequences, and in some embodiments the proximal 5* regulatory sequences 
will be unrelated to any corresponding ferritin heavy chain gene sequences. 
5 [0061] In all embodiments, the proximal 5' regulatory sequences must be 

effective to initiation transcription of the heterologous coding region to be inserted into 
the vector. Thus, in tibiose embodiments in which the proximal 5' regulatory sequences 
are based upon the corresponding ferritin heavy chain gene sequences, they should not 
be varied to such an extent that the sequences become ineffective in initiating and 
10 promoting transcription. Thus, the conservation of features such as the "TATA box" or 
ribosome binding site, or the replacement of these features with equivalent sequences, 
is necessary to preserve functionality of the expression vector. On the other hand, it is 
also acceptable to completely replace these sequences with functional equivalents from 
other genes, including any of the many known proximal 5* regulatory regions from 
15 other genes. Similarly, it is acceptable to replace these sequences with chimeric 
sequences based upon the proximal 5' regulatory regions of two or more genes. 

[0062] In some embodiments, both the distal 5' flanking sequences and the 
proximal 5' regulatory sequences include a sequence of at least 100-1000 nucleotides 
having at least 70%- 100% identity to a nucleotide sequence found within, respectively, 
20 the distal 5* flanking and proximal 5' regulatory sequences of a ferritin heavy chain 

locus. In some of these embodiments, the distal 5* flanking sequences and the proximal 
5' regulatory sequences have 70-100% identity to contiguous sequences found within a 
ferritin heavy chain locus. 

[0063] Because intron 1 of the ferritin heavy chain gene can contain positive 
25 regulatory elements, and can aid in RNA processing and transport, it can be 

advantageous to create a locus vector that includes the maintenance of all or a portion 
of intron 1 as part of the proximal 5' regulatory sequences. This can be accomplished 
by maintaining an ATG codon and, optionally, additional codons 5' to the beginning of 
the intron 1 sequences. If codons other than the ATG are maintained, they can be 
30 derived from the ferritin heavy chain gene exon 1 coding sequences or any other coding 
sequences (including synthetic or artificial sequences), and will encode the N-terminus 
of a fusion protein with the heterologous coding sequences. Such an N-terminus can 
function as a leader or signal sequence to aid in expression of the heterologous 
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sequences. Alternatively, in other embodiments, an additional heterologous sequence 
insertion site (e.g., a single restriction site or a polylinker) can be inserted 5* to the 
beginning of intron 1 so that sequences encoding various N-terminal sequences (e.g., 
leader or signal sequences) can be inserted at will. The ATG codon can be provided as 
part of the vector, or can be part of the inserted heterologous sequences. 

[0064] However, there is no need to maintain either the ATG codon or any 
other codons prior to intron 1 . Rather, in some embodiments, the ATG codon can be 
present in exon 2 or can be provided by a heterologous coding sequence. In such 
embodiments, the heterologous sequence insertion site will be present in exon 2, or at 
the intron 1/exon 2 junction, and the ATG codon either can be provided as part of the 
vector, or can be part of the inserted heterologous sequences. In all instances in which 
intron 1 is included in the vector, however, the splice donor and splice acceptor 
sequences of intron 1, or equivalent splice donor and acceptor sequences, must be 
maintained so that the intron sequences are post-transcriptionally removed. Other 
sequences within tlie intron can be deleted or varied, or additional sequences can be 
inserted, as described herein. However, in constructs in w^hich intron 1 is maintained, 
insertion of a heterologous coding region, whether 5' or 3' of intron 1, must not disrupt 
the splice donor and acceptor sites, must reconstruct the splice donor and acceptor sites, 
or must provide equivalent splice donor and acceptor sites. 

[0065] Finally, because the ferritin heavy chain gene exon 1 also contains an 
iron regulatory element (IRE) 3' to the ATG (at approximately positions -138 to -111) 
that negatively controls translation depending on the level of iron (reviewed in 
Klausner et al. (1993), Cell 72:19-26), the creation of the locus vector can optionally 
include the deletion of the IRE from the proximal 5' regulatory sequences. 

B. Ferritin Heavv Chain Coding Regions . 

[0066] Typically, the locus vector will not include any coding regions from 
tlie ferritin heavy chain gene. However, depending upon the method by which the 
vector is created, ferritin heavy chain coding regions can be included intentionally or as 
artifacts. For example, if the entire ferritin heavy chain gene region is cloned into a 
vector with the intention of using only the distal 5' flanking sequences and/or proximal 
5' regulatory sequences (together "the 5' ferritin sequences"), the coding regions can be 
purposefully deleted in their enturety. Alternatively, a heterologous sequence insertion 
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site (e.g., a single restriction site or a polylinker) and proximal 3' regulatory sequences 
(and optionally distal 3' flanking sequences) could be inserted immediately 3' to the 5* 
fenitin sequences without deleting the coding regions. Because of the intervening 
insertion, the coding regions would be inactivated. Similarly, all of the coding regions 
5 except the start codon could be deleted or, altematively, the heterologous sequence 
insertion site and proximal 3* regulatory sequences (and optionally distal 3' flanking 
sequences) could be inserted immediately 3' to the start codon. In addition, a larger 
portion of the coding region can be maintained before the insertion of the heterologous 
sequence insertion site and proximal 3' regulatory sequences (and optionally distal 3* 
1 0 flanking sequences) so that a fusion protein can be produced. Finally, combinations of 
the foregoing approaches can be employed such that the ferritin heavy chain coding 
regions are partially deleted and partially inactivated by the insertion of intervening 
sequences. In some embodiments, however, in order to reduce the size of the vector, 
inactivated and untranslated sequences are deleted. 

15 C. Heterologous Sequence Insertion Site . 

[0067] Downstream from the proximal 5* regulatory sequences, the high 
expression locus vector of the invention includes an insertion site for a heterologous 
sequence, such as a polylinker site. The heterologous sequence insertion site can be 
any sequence into which a heterologous sequence can be inserted in a sufficiently 

20 controlled and predictable manner to allow for production of functional high expression 
locus vectors with a reasonable expectation of success. Insertion sites for a 
heterologous sequence can include sites for homologous recombination, site-directed 
integration (e.g., via transposons or viral constracts), or endonuclease-mediated 
restriction. The length of the insertion site can vary from 4 bp (for use with four-cutter 

25 restriction endonucleases) to 1,000 bp or 5,000 bp (for use with homologous 

recombination methods). However, in certain circumstances, the 3' end of the proximal 
5' regulatory sequences and the 5* end of the proximal 3' regulatory sequences can form 
an insertion site without the need for the inclusion of additional nucleotides between 
them. Thus, for example, the last two nucleotides of the proximal 5' regulatory 

30 sequences and the first two nucleotides of the proximal 3' regulatory sequences can . 
form a 4 bp restriction site which can serve as an insertion site for the heterologous 
sequences. Altematively, only one or a few nucleotides may be required to form an 
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insertion site between these sequences. Thus, the length of the insertion site could be 0, 
1, 2, or 3 bp, as well as the 4 bp to 5,000 bp described above. 

[0068] In some embodiments, the heterologous sequence insertion site will 
include one or more nucleotide sequences, on either the sense or antisense strand, 
5 which serve as restriction site(s) for natural or artificial endonucleases. These 

restriction sites can be unique in the vector, and the insertion site can be a polylinker 
that includes amultiplicity of such restriction sites to afford greater flexibility of use 
with different restriction endonucleases. An example of such a polylinker is provided 
in Example 1 and Figure 3. 

10 D. Proximal 3' Regulatory Sequences . 

[0069] Dovmstream firom the insertion site for the heterologous sequences, the 
high expression locus vector of the invention includes proximal 3' regulatory 
sequences. At a minimmn, these sequences include a polyadenylation signal. In some 
embodiments, the proximal 3' regulatory sequences also include a transcriptional 

15 termination signal. In some embodiments, the sequences can include the translation 
termination or stop codon, whereas in other embodiments the stop codon will be 
included in the heterologous sequence insert. 

[0070] The proximal 3' regulatory sequences caii be derived from the ferritin 
heavy chain gene, but need not be. For example, in some embodiments, the proximal 3* 

20 regulatory sequences of the locus vector will include a sequence of at least 10-2,000 
bases nucleotides having at least 70%- 100% identity to a nucleotide sequence found 
within the proximal 3' flanking sequences of a ferritin heavy chain locus. Thus, in 
some embodiments, the proximal 3* regulatory sequences can include at least 10, 25, 
50, 100, 500, 750, 1,000, or 2,000 nucleotides having at least 70%, 75%, 80%, 85%, 

25 90%, 95% or 100% identity to a nucleotide sequence found within the proximal 3' 
regulatory sequences of a ferritin heavy chain locus. In other embodiments, tiie 
proximal 3' regulatory sequences will consist essentially of a polyadenylation signal, 
which can be derived from a ferritin heavy chain gene, a heterologous sequence, or a 
synthetic or artificial sequence. 

30 [0071] In other embodiments, the proximal 3' regulatory sequences of the 

locus vector will share lower percentages identity with the corresponding ferritin heavy 
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chain gene sequences, and in some embodiments the proximal 3' regulatory sequences 
will be unrelated to any corresponding ferritin heavy chain gene sequences. 

E. Distal 3' Flanking Sequences . 

[0072] Downstream from the proximal 3' regulatory sequences, the high 
5 expression locus vector of the invention optionally includes distal 3* flanking 

sequences. The distal 3* flanking sequences can be derived from the ferritin heavy 
chain gene, but need not be. For example, in some embodiments, the distal 3' flanking 
sequences of the locus vector will include a sequence of at least 100-100,000 
nucleotides having at least 70%-100% identity to a nucleotide sequence found within 

10 the distal 3* flanking sequences of a ferritin heavy chain locus. Thus, in some 

embodiments, the distal 3' flanking sequences can include at least 100, 500, 750, 1,000, 
10,000, 25,000, 50,000, or 100,000 nucleotides having at least 70%, 75%, 80%, 85%, 
90%, 95% or 100% identity to a nucleotide sequence found within the distal 3' flanking 
sequences of a ferritin heavy chain locus. As shown in the examples below, the distal 

15 3' flanking sequences can include 1,000-10,000 bp, 2,000-9,000 bp, 3,000-8,000 bp or 
4,000-7,000 bp of flanking sequences. 

[0073] In other embodiments, the distal 3' flanking sequences of the locus 
vector will share lower percentages identity with the corresponding ferritin heavy chain 
gene sequences, and in some embodiments the distal 3* flanking sequences will be 

20 unrelated to any corresponding ferritin heavy chain gene sequences. 

[0074] The following examples illustrate some specific modes of practicing 
the present invention, but are not intended to limit the scope of the claimed invention. 
Alternative materials and methods may be utilized to obtain similar results. 

25 

EXAMPLE 1 

Creation of a Ferritin Heavy Chain Locus Vector . 

[0075] In order to generate a high expression locus vector based on the ferritin 
heavy chain gene, three phases of development were employed: (1) cloning of a 
30 ferritin heavy chain gene with substantial 5 ' and 3' regions; (2) production of an 

expression vector based on at least one of these gene regions, and (3) optimization of 
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the vector. As noted above, many other approaches could have been employed to 
. produce the same or equivalent locus vectors. 

[0076] First, the region contaiiiing the ferritin heavy chain exons from cosmid 
15A was subcloned into the Litmus 38 vector (New England Biolabs) to generate 
5 plasmid pFerXl (Figure 2). The BamHI-XhoI fragment was isolated from cosmid 15 A 
and ligated into Litmus 38 digested with BanoHI and Sail to generate plasmid pFerXl. 
Note that cosmid 15A was only partially sequenced and that some of the restriction site 
locations are based on restriction mapping. Therefore, some of the restriction site 
locations may not be accurate. 

10 [0077] Figure 3 illustrates the deletion of the fragment containing exons 2, 3, 

and 4 from pFerXl and the insertion of a polylinker containing Aatn and Sail 
restriction sites to generate plasmid pFerX2. The deleted Hpal fragment extended from 
the Hpal site in the insert to the Hpal site in the vector in pFerXl . The 5 ' end of the 
polylinker regenerated the Hpal site, but the 3' end did not. Screening for the 

1 5 orientation of the linker was done using PGR. 

[0078] The exon 1 coding region was deleted from pFerX2, leaving the ATG 
initiation codon and the following sphce donor intact to generate plasmid pFerX3. 
Figure 4 illustrates that the deletion of the exon 1 coding region was accomplished by 
isolating the BamHI-BspHI (2515-2719) and NcoI-BamHI (2830-2515) fragments from 

20 pFerX2. BspHI and Ncol generate compatible overhangs which permitted the resulting 
fragments to be ligated together to generate pFerX3. As a result of this manipulation, 
exon 1 of the vector was changed from: 



CCAGCCGCCATC ATG ACC ACC GCG TCT CCC TCG CAA GTG CGC CAG AAC TAG CAC CAG GAG TCG GAG GOT 
GGTCGGCGGTAG TAG TGG TGG CGC AGA 6GG AGC GTT CAC GCG GTC TTG ATG GTG GTC CTG AGC CTC CGA 

► Met Thr Thr Ala Ser Pro Ser Gin Val Arg Gin Asn Tyr His Gin Asp Ser Gl u Ala 



GCC ATC AAC CGC CAG ATC AAC CTG GAG TTG TAT GCC TCC TAC GTC TAT CTG TCC ATG GTGAGTGCGGCCT 
CGG TAG TTG GCG GTC TAG TTG GAC CTC AAC ATA CGG AGG ATG CAG ATA GAC AGG TAC CACTCACGCCGGA 

>'A1 a Me Asn Arg Gin lie Asn Leu Gl u Leu Tyr Ala Ser Tyr Val Tyr Leu Ser Met 



BspHt 



Ncol 



Splice Donor 



to: 
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Splice Donor 

CCAGCCGCCATC ATG GTGAGTGCGGCCT 
6GTCGGCGGTAG TAG CACTCACGCCGGA 
► Met 

[0079] Deletion of the exon 1 IRE was accomplished by replacing the SacII- 
Eagl (2575-2639) jfragment in pFerX3 with a linker that does not contain flie IRE (but 
creates a 5* Kpnl site for screening) to generate plasmid pFerX4. As a result of this 
manipulation, exon 1 of the vector was changed from (IRE underlined): 

Sacll (2575) Eagl (2639) 

CAGAGTCGCCGCGGT TTCCTGCTTCAACAGTGCTTGAACGGAA CCCGGTGCTCGACCCCTCCGACCCCCGTCCGGCCGCTTTGAGCC 
GTCTCAGCGGCGCCAA AGGACGAAGTTGTCACGAACTTGCCTT GGGCCACGAGCTGGGGAGGCTGGGGGCAGGCCGGCGAAACTCGG 

to (linker shown in bold): 

Kpnl (2579) 

Sacll (2575) Eagl (2611) 

CAGAGTCGCCGCGGTACCGGT GCTCGACCCCTCCGACCCCCGTCCGGCCGCTTTGAGCC 
GTCTCAGCGGCGCCATGGCCACGAGCTGGGGAGGCTGGGG6CAGGCCGGCGAAACTCGG 

[0080] A PGR fusion product was generated in a three step procedure to 
replace exons 2 though 4 with a polylinker containing Swal and NotI, while 
maintaining the proximal 5* regulatory sequences and proximal 3' regulatory sequences 
of the ferritin heavy chain gene. As shown in Figure 5(A), the first PGR used cosmid 
15A (Figure 2) as a template. Primer locations for primers Ferl and Fer4 are indicated 
by airows. The "priming'' region for primers FNl and FN2 are also indicated by bars. 
In the second step, shown in Figure 5(B), a Ferl-FN2 PGR product was generated. . 
The location of the "priming" region of primer Swa-2 is indicated. In the third step, 
shown in Figure 5(C), a FN1-Fer4 PGR product was generated. The location of the 
"priming" region of primer Swa-1 is indicated. In the fourth and final step, as shown in 
Figure 5(D), the final PGR fiision product was generated by using the Ferl •Swa-2 and 
Swa-1 -Fer4 products as templates and the Ferl arid Fer4 primers. The Hpal-Aatn 
firagment was isolated firom this product for insertion into the Hpal and Aatll sites of 
pFerX4 to generate plasmid pFerX5 (see Figure 6). The PGR fusion reactions used in 
the first three steps to generate the Swal-NotI polylinker are shown in TABLE 1. 
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1 einpiate(s) 


. ■ - 
5 primer 


J priiiier 


First PGR 


Cosmid 15A 


Ferl 


FN2 




Cosmid 15A 


FNl 


r er4 


Second PGR 


Ferl/FN2 

product 


Ferl 


Swa-2 




FN1/Fer4 
product 


Swa-1 


Fer4 


Imra rCK 


rerl/owa-z oc 

Swa-1/Fer4 

products 


Ferl or Fer3 


Fer4 



The PCR primers are sho\\ai below, where the polylinker sequence is shown in bold, 
and the complementary sequences between FNl and FN2 or between Swa-1 and Swa-2 
5 are shown underlined. 

Nhel Noti Aatll 

FNl ACTTTCAG C TGCTAGCGGCCGCGC TGACGT CCCCAAGGCCAT 

NotI Nhe! 

FN2 ACGTCA G CGCGGCCGCTAGC A G C TGAAAGTGGAAAGGGT AT 

Swaf Not! Aatll 

Swa-1 CTTTCCATTTAAATCTGC T A GCGGCCGC TGACGTC 

Swal 

Swa-2 TAgCAG ATTTAAATGGAAAG GGTATTTGTTATTGATC 

1 0 [0081] The Swal site in the vector backbone of pFerX4 was removed by blunt 

cleavage of the plasmid with Swal and insertion of tlie double-stranded oligo: 

GGCGCGCC 
CCGCGCGG 

15 

which contains an AscI site to generate plasmid pFerX4.1 . The Swal site was removed 
from the vector backbone in order to make the Swal site in the polylinker above unique, 

[0082] The vector backbones of pFerX4.1 (in wliich the msertion of the AscI 
oligo of Figure 15 destroyed the Swal site in the vector backbone) and pFerX5 (which 
20 included tlie Swal site in the backbone and the polylinker) were swapped using SacII- 

21 



o 



o 



• 



( 




wo 2004/037982 



;T/US2003/033433 



10 



15 



20 



25 



Aatll fragments to generate plasmid pFerX5.1 (Figure 7). pFerXS.l contained the 
polylinker and but lacked the Swal site in the backbone, making the Swal site in the 
polylinker unique, • 

[0083] The polylinker 



was inserted into the Sail- Aatll sites of pFerXS.l to generate plasmid pFerX6. The 
polylinker includes both BgUI and BstBI sites and was designed to receive the distal 3' 
flanking sequences of the ferritin heavy chain gene. 

[0084] The distal 3' flanking sequences of the ferritin heavy chain gene 
(Aatn-BamHI fragment from cosmid 15 A) were inserted into the Aatll-Bglll sites of 
pFerX6 to generate plasmid pFerX7 (Figm'e 8). 

[0085] The distal 5' flanking sequences of the ferritin heavy chain gene 
(BamHI fragment from pFerHl, a subclone of cosmid 15A, Figure 2: BamHI 10269- 
15176) were inserted into the BamHI site of pFerX7 to generate plasmid pFerXS 
(Figure 9). 

[0086] The origins of the various sequences fomiing pFerXS are shown in 
Figure 10. The Litmus 38 backbone is indicated by the filled box. This plasmid 
contains >6kb of distal 5' flanking sequences before the initiating ATG codon and 
~7kb of distal 3' flanking sequences following the temiination codon. The Swal and 
NotI cloning sites are located at positions 10240 and 10254, respectively. Coding 
regions inserted into the Swal and NotI sites should be blunt ended at the 5' end (Swal 
end) and should start with the bases CAG to regenerate the splice acceptor followed by 
the second amino acid. The NotI site should be present at the 3' end following the 
temiination codon. 

[0087] An additional segment of distal 5' flanking sequence (BspEI fragment 
from cosmid 15 A) was inserted into the BspEI site (6037) of pFerXS to generate 
plasmid pFerX9 (Figure 1 1 ; BspEI fragment 6034-1421 1). This insertion adds both a 

imique segment of distal 5' flanking sequence as well as repeating a segment of the 
distal 5* flanking sequence already present in pFerXS (10697-13990 is the same as 



Bglil BstB) 



CTGTGAGATCTGTTCGAATGG 
TGCAGACACTCTAGACAAGCTTACCAGCT 



Aatll Sail 
compatible compatible 
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2520-5813). This plasmid is not entirely sequenced and the locations of some of the 
restriction sites are estimated based on restriction fragment sizes. 

[0088] The sequence of the transcribed region of the pFerXS and pFerX9 
plasmids is shown in Figure 12. The putative transcription start site is indicated. The 
5 intron is shown in lower case. The putative TATA and polyadenylation signals are 
underlined. The initiation codon is in the first exon and the inserted gene, starting with 
the second amino acid, is inserted into the Swal and NotI sites. 



10 Expression of Heterologous Sequences . 
A. Reporter Gene 

[0089] A reporter gene was inserted into the Swal-NotI sites in the polylinker 
of both the pFerXS and pFerX9 plasmids. Secreted alkaline phosphatase (SEAP) was 
selected as a reporter gene because the commercially available assay (Clontech, Palo 

1 5 Alto, CA) for the product is simple and rapid. The expression vectors were designated 
pFerXSSEAP and pFerX9SEAP. 

[0090] The sequence of the vector polylinker and the original sequence at tlie 
5' end of exon 2 that needs to be recreated to regenerate the splice donor are shown in 
Figure 5. Thus, the 5' primer should include a CAG at the 5' end to recreate the natural 

20 splice donor followed by the coding region starting with tlie second amino acid (the 
ATG is already included in exon 1). The 5' end of the PGR product should be left 
blunt-ended for hgation with the Swal site. , For example: 

General 5' primer: 

25 CAG NNN NNN NNN NNN NNN NNN KNlsF 



[0091] The 3' primer should include a NotI site followed by the 3 ' end of the 
gene including the termination codon (opposite strand). The PGR product should be 
digested with NotI to generate an end compatible with the NotI site in the polylinker. 
For example: 



EXAMPLE 2 



AA2 AA3 AA4 AA5 AA6 AA7 AA8 



Primer for SEAP example: 



CAG CTG CTG CTG CTG CTG CTG CTG GGC 



a. '"^ 
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General 5' primer: 

NNNN GCGGCCGC NNN NNN NNN NNN MNN NNN NNN 
NotI 3' end of gene 

site 

5 

Primer for SEAP example (termination codon in bold): 

TTTT GCGGCCGC AGC TCA TGT CTG CTC GAA GCG GCC 

[0092] Ligation of the PGR product with the vector (digested with Swal and 
10 NotI) does not recreate a Swal site at the 5' end of the insert. Instead the Hgated 

product contains a suitable sphce acceptor at the "Swal end." The inserted region will 
also contain the coding sequence from the second amino acid to the termination codon 
followed by the NotI site at tlie 3' end. For example: 

1 5 After ligation generally: 

CCATTT CAG NNN NNN NNN // NNN NNN NNN GCGGCCGC TGACGT 

Example for SEAP: 

CCATTT CAG CTG CTG CTG // CAG ACA TGA GCGGCCGC TGACGT 

20 

B. Transfections 

[0093] The host used for ti-ansfections was the CHO DG44(E) cell line 
(Urlaub et al. (1986), Somatic Cell MoL Gen. 12:555-566), which had been selected for 
growth and survival in serum-free media. This cell line was maintained in a spinner 

25 flask in serum-free media with added nucleosides. The cells used for transfection were 
in exponential growth. Either 2x10^ or 5x10^ cells were used for each transfection. 

[0094] Reporter plasmids were co-transfected with a plasmid designated pSI- 
DHFR.2 encoding dihydrofolate reductase (DHFR) so that stable transfectants could be 
selected in the DHFR-host. The pSI-DHFR.2 plasmid includes a selectable marker and 

30 the dhfj' gene driven by the SV40 promoter with the S V40 enhancer deleted (Figure 
13). 

[0095] All DNA was prepared by Megaprep kit (Qiagen, Valencia CA). Prior 
to transfection DNA was EtOH precipitated, 70% EtOH washed, dried, resuspended in 
HEBS (20 mM Hepes pH 7.05, 137 nxM NaCl, 5 mM KCl, 0.7 mM Na2HP04, 6 mM 

24 
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dextrose), and quantitated prior to transfection. As a positive control, aplasmid which 
. expresses SEAP with an SV40 early promoter/enhancer (pSEAP2, Clontech, Palo Alto, 
CA) was employed. Negative controls included an empty pUC 18 vector (ATCC 
#37253, American Type Culture Collection, Manasssas, VA) as a reporter control and a 
5 no DNA transfection as a transfection control. 

[0096] Each transfection contained 50 \ig of a reporter plasmid and 5 |ig pSI- 
DHFR.2. Equal plasmid weight was selected rather than equiniolar amounts. From a 
molarity perspective there are differences on the order of 3-5 fold between the control 
reporters and the test reporters (TABLE 3). In each case the test reporter was lower 
1 0 than the control. 



TABLES 



Reporter plasmid 
(50 |xg each) 


Plasmid 
size (kb) 


Molar ratio 
to controls 


DHFR 
(5 ug each) 


Reporter 
gene 


pSEAP2 


5.1 


1 


pSIDHFR.2 


SEAP 


pFerXSSEAP 


18.9 


0.27 


pSlDHFR.2 


SEAP 


pFerX9SEAP 


26.6 


.0.19 


pSIDHFR.2 


SEAP 


pUC18 


2.7 




pSIDHFR.2 


none 


No DNA 






No DNA 


none 



[00971 Cells and DNA were transfected by electroporation in 0.8 ml of HEBS 
using a 0.4 cm cuvette (BioRad, Hercules, CA) at 0.28 kV and 950 jjF. After the 

.15 electroporation pulse, the cells were allowed to incubate in the cuvette for 5-10 min at 
room temperature. They were then transferred to a centrifuge tube containing 10 ml.of 
Alpha-MEM plus nucleosides (GIBCO, Gaithersburg, MD) witti 10% dFBS (HyClone, 
Logan, UT) and pelleted at IK rpm for 5 min, Resuspended pellets were seeded into T- 
flasks in Alpha-MEM without nucleosides with 10% dFBS and incubated at SS^'C with 

20 5% CO2 in a humidified incubator imtil colonies formed. 

[0098] TABLE 4 summaries seven experiments which were conducted. 
Transfections 1-3 were each performed in triplicate, and trausfections 4-7 were 
performed once each. 
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TABLE 4 



Exp. 

# 


XVW^ 1^1 lJi.Cl.oJJlXi.VJ 

(50 Lie each) 


(S US each") 


— ~ 

Reporter 


1 


pSEAP2 


pSIDHFR.2 


SEAP 


2 


pFerXSSEAP 


pSIDHFR.2 


SEAP 


3 


pFerX9SEAP 


pSE)HFR.2 


SEAP 


4 


pFerXS 


pSrDHFR.2 


none 


5 


pFerX9 


pSIDHFR.2 


none 


6 


pUClS 


pSn)HFR.2 1 


none 


7 


NoDNA 


NoDNA 


none 



10 



15 



20 



25 



C. Transfection Efficiency 

[0099] Approximately 2 weeks after the transfections, colonies had formed. 
Stable transfectants were analyzed as either pools or isolates. Although all the pSI- 
DHFR.2-containing transfections produced colonies, ttue transfections containing the 
ferritin heavy chain locus vectors produced fewer colonies than did the controls. This 
was true whether or not tlie locus vector expressed a product. These results were 
surprising since the same amount of DNA was included in each transfection. Because 
of the difference in transfection efficiency it is recommended that multiple transfections 
be done to account for the reduced number of transfectants. 

D. SEAP Assay 

[0100] The reporter constructs containing the SEAP gene were analyzed using 
the Great EscAPe™ SEAP Reporter System 3 (Clontech, Palo Alto, CA). This assay 
uses a fluorescent substrate to detect the SEAP activity in the conditioned media. The 
kit was used in a 96-well format according to the manufacturer's instructions with the 
following exceptions. All standards and samples were diluted in fresh media rather 
than the dilution buffer provided. Instead of performing one reading after 60 min, 
multiple reads were taken at 10-20 min intervals and used to express SEAP activity as 
relative fluorescent units per minute (RFU/mm). The emission filter available for the 
Cytofluor n plate reader was 460 nm instead of the recommended 449 nm. 

(01 01] All of the data generated for the pools and isolates below was based on 
the reporter constructs expressing SEAP. The titers reported were based on a positive 
control with the kit. Although absolute values were not derived, the relative titer values 
are useful. 

(01 02] The specific productivity was assessed in assays in which the media 
was exchanged for firesh media and then, 24 hours later, the media was sampled and the 
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cells were counted. The product titer was normalized for the cell number at the end of 
the 24 hour assay. Because the titers were relative, the specific productivities are 
expressed as relative values. 

Specific productivity = product titer f/mH x volume (ml) 

time (days) x # of cells 

E. Transfectant Pools 

[0103] After the appearance of colonies, the cells were collected and pooled 
from each transfection. Pools were seeded into 6-well plates or T-flasks and were kept 
subconfluent for the 24 hour assay. Results from tlie pool assays are shown in Figure 
14, Five pools were analyzed for each construct, two from experiment 1 (1 A and IB) 
and three from experiment 2 (2A, 2B, and 2C). All assays were done three to four 
weeks post-transfection. Note that the experiment 2C with pFerXSSEAP had a very 
low transfection efficiency relative to the other t-ansfections. 

[0104] Specific productivities were fairly consistent with the control 
(pSEAP2) but highly variable Avith tlie pFerXSSEAP and pFerX9SEAP vectors. 
Notably, the ferritin vectors were capable of generating pools with higlier specific 
productivities than the conti-ol. 

F. Transfectant Isolates 

[0105] Isolates were obtained by "picking" colonies from transfection 
experiment #2. 'Ticking" was accomplished by aspirating directly over a colony with a 
P200 Pipetman set at 50 \ih The aspirated colony was transferred first to a 48-well 
plate and then to a 6 well plate when there were a sufficient number of cells. Specific 
productivities were assessed in 6- well plates at near confluent to confluent cell densities 
using the 24-hour assay described above. 40-50 isolates were analyzed for each 
construct. The results are shown in Figure 15, in which the isolates are presented ia the 
order of their specific productivity for each SEAP expression construct. The scale of 
specific productivity is consistent between the panels for comparison. 

[0106] The majority of the isolates (63%) from the pSEAP2 transfections did 
not express product above the limit of detection. The highest productivity from 
pSEAP2 in this experiment was 46 units per cell per day (relative value for 
comparison). In contrast only 28% of the isolates from the pFerXSSEAP transfections 
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expressed product below the limit of detection and 44% had productivities above the 
higliest pSEAP2 transfectant. The highest productivity from pFerXSSEAP in this 
experiment was 259 units per cell per day, miore than five-fold higher than the highest 
productivity from pSEAP2. Although the pFerX9SEAP construct performed better 
than pSEAP2, it did not perform as well as pFerXSSEAP. 

EXAMPLES 

Reduction of Vector Size . 

[0107] hi order to reduce the size of the vector for ease of use, 5' and/or 3' 
regions of the vector were deleted (TABLE 5). These deletions were tested as before 
using SEAP as a reporter. Approximately 30 isolates were tested from each of the 
plasmids shown in TABLE 5 as well as from the controls, pSEAP2 and pUClS (10 
isolates). 



TABLES 



Plasmid 


Region 
deleted 


5' end of the 
deletion* 


3' end of the 
deletion* 


Size of the 
plasmid (bp)** 


pFerXSSEAP 


none 






19340 


pFerXlOSEAP 


5' 


2513 


7414 


14439 


pFerXllSEAP 


3' 


13727 


17636 


15431 


pFerX12SEAP 


5' 


2513 


7414 


S042 


3' 


12704 


19101 


* The deletion end points are based on the pFerXJ 


3 sequence numl 


bering 



The SEAP gene constitutes 1557 bp of the plasmid 



[0108] The pFerXl 1 SEAP vector perfomaed similarly to the pFerXSSEAP 
vector, indicating that the ~3,9 kb deletion in the 3' region described in TABLE 5 was 
not detrimental. The pFerXlOSEAP and pFerX12SEAP vectors did not perform as 
well as pFerX8SEAP, indicating that the -4.9 kb 5' deletion described in TABLE 5 
was detrimental to function. 

EQUIVALENTS 

[0109] While this invention has been particularly shown and described with 
references to certain specific embodiments thereof, it will be understood by those 
skilled in the art tliat various changes in form and details may be made therein without 
departing from the spirit and scope of the invention as defined by the appended claims. 
Those skilled in the art will recognize, or be able to ascertain using no more than 
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routine experimentation, many equivalents to the specific embodiments of the invention 
described specifically herein. Such equivalents are intended to be encompassed in the 
scope of the appended claims. 
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CLAIMS 

What is claimed is: 



1 , A genetic vector for stable transfection and expression of a desired protein 
5 within eukaryotic cells comprising: 

(a) distal 5' flanking sequences of a eukaryotic locus; 

(b) proximal 5' regulatory sequences of a eukaryotic locus; 

(c) at least a first insertion site for a j5rst heterologous coding sequence; and 

(d) proximal 3* regulatory sequences effective for transcription termination of a 
1 0 eukaryotic locus; 

wherein said sequences are operably joined in order (a)-(d) in a 5' to 3* 
orientation, with optional linker sequences between adjacent sequences; and 
wherein 

(1) said distal 5' flanking sequences comprise a sequence of at least 100 bases 
15 having at least 70% identity to a nucleotide sequence found between 20 bp and 100,000 

bp 5' of a transcriptional initiation site of a ferritin heavy chain locus; or 

(2) said proximal 5' regulatory sequences comprise a sequence of at least 20 
bases having at least 70% identity to a nucleotide sequence found between 1 bp and 
10,000 bp 5* of a translational initiation codon of a ferritin heavy chain locus. 

20 

2. A genetic vector for stable transfection and expression of a desired protein 
within eukaryotic cells comprising: 

(a) distal 5' flanking sequences of a eukaryotic locus; 

(b) proximal 5' regulatory sequences of a eukaryotic locus; 

25 (c) at least a first heterologous coding sequence encoding said desired protein; 

and 

(d) proximal 3' regulatory sequences effective for transcription termination of a 
eukaryotic locus; 

wherein said sequences are operably joined in order (a)-(d) in a 5' to 3* 
30 orientation, v^th optional linker sequences between adjacent sequences; and 
wherein 
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(1) said distal 5' flanking sequences comprise a sequence of at least 100 bases 
having at least 70% identity to a nucleotide sequence found between 20 bp and 100,000 
bp 5' of a transcriptional initiation site of a ferritin heavy chain locus; or 

(2) said proximal 5' regulatory sequences comprise a sequence of at least 20 
bases having at least 70% identity to a nucleotide sequence found between 1 bp and 
10,000 bp 5' of a translational initiation codon of a ferritin heavy chain locus. 

3. A genetic vector as in any one of claims 1-2 wherein said distal 5' flanking 
sequences are derived from a ferritin heavy chain locus. 

4. A genetic vector as in any one of claims 1-2 wherein said proximal 5' regulatory 
sequences are derived from a ferritin heavy chain locus. 

5. A genetic vector as in any one of claims 1-2 wherein said proximal 5' regulatory 
sequences and said distal 5' flanking sequences are derived from a ferritin heavy chain 
locus. 



6. A genetic vector as in any one of clarais 1-5 wherein said proximal 3' regulatory 
sequences are derived from a ferritin heavy chain locus. 

7. A genetic vector as in any one of claims 1-6 ftirther comprising 
distal 3' flanking sequences of a ferritin heavy chain locus. 

8. A genetic vector as in any one of claims 1 , and 3-7 wherein said insertion site 
for a heterologous sequence includes at least one restriction endonuclease site. 

9. A genetic vector as in claim S wherein said insertion site for a heterologous 
sequence is a polylinker site including at least two restriction endonuclease sites. 

10. A genetic vector as in any one of claims 1 -9 wherein said proximal 5' regulatory 
sequences include a eukaryotic intron sequence. 
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15 



20 



25 



11. * A genetic vector as in claim 10 wherein said eukaryotic intron sequence is 
derived from intron 1 of a ferritin heavy chain gene. 

12. A genetic vector as. in any one of claims 1-11 wherein said proximal 5' 
regulatory sequences include untranslated exon sequences. 

13. A genetic vector as in any one of claims 1-12 wherein said distal 5' flanking 
sequences and said proximal 5* regulatory sequences have a total length of between 
1,000 and 10,000 bases. 

14. A genetic vector as in any one of claims 1-12 wherein said proximal 3' 
regulatory sequences and any distal 3' flanking sequences have a total length of 
between 1,000 and 10,000 bases. 

15. A eukaryotic cell transfected with a vector of any one of claims 1-14. 

16. A eukaryotic cell as in claim 15 wherein said vector has stably integrated into a 
chromosome of said cell. 

17. A eukaryotic cell as in any one of claims 15-16 wherein said first coding 
sequence is expressed in said cell. 

18. A eukaryotic cell comprising 

(a) distal 5' flanking sequences of a eukaryotic locus; 

(b) proximal 5' regulatory sequences of a eukaryotic locus; 

(c) at least a first coding sequence; and 

(d) proximal 3' regulatory sequences effective for transcription termination of a 
eukaryotic locus; 

wherein said sequences are operably joined in order (a)-(d) in a 5" to 3* 
orientation, with optional linker sequences between adjacent sequences; and 
wherein 
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(1) said distal 5' flanking sequences comprise an exogenous sequence of at least 
100 bases having at least 70% identity to a nucleotide sequence found between 20 bp 
and 100,000 bp 5' of a transcriptional initiation site of a ferritin heavy chain locus; or 

(2) said proximal 5* regulatory sequences comprise an exogenous sequence of at ' 
least 20 bases having at least 70% identity to a nucleotide sequence found between 1 bp 
and 10,000 bp 5' of a translational initiation codon of a ferritin heavy chain locus. 

19. A eukaryotic cell comprising: 

an exogenous 5' distal flanking sequence derived from a ferritin heavy chain 
locus operably joined to a coding sequence. 

20. A method of producing a desired protein in a eukaryotic cell comprising: 

(a) providing at least one cell of any one of claims 15-19 or a descendent 
thereof; 

(b) maintaining said cell in a culture under conditions which permit high 
expression of said desired protein; and 

(c) isolating said desired protein jSrom said culture. 
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A- Exon 1 



Gsat BseRl Sfd - Earf 

AGCrCAGAGACCCAAGAGCCGCCrrCACAATCACACAGGCTCCTCCCCeCCCACGCACTGCTGSCTTGQGCAAC&CGCCTACAGGAA 
TCGAGTCTCTGGSTTCTCCGCGGAGTGTTASTSTGTCCGACGAGGGGCGGGTGCGTGACGXCCGAACCCGTTGTGCGGATGTCCrTCTCCT 

Eael SssKQ EcoNi BamHl Sadl 

G ATTGS CCGGAGCSCG CCTGACGCAGGATCCCG CTATAAAS TGCGCCCCSCTGGTCCCTACCCCACACgTTCTeCCCCACAGTgCCCGCgCTTTCCTOCTTCAACAGTGCTTO 
CTAACCG5CCTCGCGCGGACTGCGTCCTACGCC GATATTrC ACGCCGGGCGACCAGGSATGCGGTCTgeAAflASCCSaTCTCASCCgJgCCAAAG(^ 

Sann Bpml 
BsHKAl PshAI Eagl BpulOl Bsf8l 

ACGGAACCCCGTGCrTCGACCCCTCCGMCCCC6TCCGGCMCTT?flAGCCT0A5CCCTT7CCAACTTCGTCGCTCCaCC»CTCCAGCGTCGCCTC^^ 
TGCCTTGGGCCACGAGCTSCK:GASSCTGGSGGCAGGCCGGCGAAACTC&SACT?GGGAAACGTTGAAG7i^CSAGGC5GCGA5GTCGCAaCaaASGCG7^ 

OnjI 

BspHl BsmBi Bpml Ncoi 

ATCATCACCACCGCGTCTCCCTCGCAACTGCCCCAGAACTACCACCAGGACTCCGAGGCTCCCATCAACCCCCAGATCAACCTGGAGT^ 

TACTACTCGTCGCGCAGAGCGACCGTTCACGCGGTCTTGATGCTGGTCCTGAGCCTCCGACGGTAGTTGGCGGTCTAGTTGGACCrCAACATACGGAGGATGCAGATAGAC^ 

^MeiThrThrAlaSerProSerGl nValArgGlnAsnT/r HIsGI nAspSerGluAl aAlal f cAsnAr gGlnl leAsnLeuGluLeuTyrAlaSer TyrValTyr LeuSer 

Oram 

Styl Adel 

ATGGTGAGTGCGGCCTGGCCTTTGCGGGGGCGGAAAGAGGGTGCGGCCTGGCCTCCCTTGGGCCACTTGCTGAGCTGaCGGASGGTGGGTTGSOT 
TACCACTCACGCCGGACCGGAAACGCCCCOGCCTTTCTCCCACGCCGaACCGGAGGGAACCCGGTOAACCACTCGACCGCCTCCCACCCAACCCCGCACCGACGCCC 



Exon 2 



EeoSTI Bsp^ 

GCATCTGCCTTGCTOTGQGGATCAATAACAAATACCCrrTCCACTTTCAGTCTTGTTATTTTGACCGGGATC 
CGTAGACGGAACOACACCCCTAGTTATTGTTTATGGGAAAGGTGAAAGTCAGAACAATAAAACTGGCCCTACTACACCGGGACT^ 

^ Ser CysTyr PheAspAr gAspAspValAI aUeuLysAsnPheAl eLysTyr PheLeuHIsQl nSor 
PsU 

Eail Nspl Pstl BstXI Sbf] EcoRV AccI 

CATGAAGAGAGGGAACATCCTGAGAAACrGATGAAGCTGCAGAACCAGCGAGGTGGACGAATCTTCCTGCAGGATATCAAGGTAAGTAGACTATaGGACTGCGTTAAATGAG 

^ HI sGI uQf uAr gGI uHI sAI aGI uLysLeuMstLysLeuGI fiAsnGl nAr gOI yGt yAr gl 1 ePhoLouGl nAspl I eLys 



C. Exons 3 and 4 



Pstl Afflll BfTOf BsiBI BsmlBsrOI ApaU Bsgf 

CTGCAGATGAATTGACATGTTTCTTTGATTCAGAAACCTGACCGTGATGACTGGGAGAGCGGGCTGAATGCAATGAGGTGTGCACTGCACTTGGAAAAGAGTGTGAAT^ 
GACCTCTACTTAACTGTACAAAGAAACTAAGTCTTTGCACTGGCACTACTCACCCTCTCGCCCGACTTACGTTACTCCACACGTGACGTGAAC C TTTTCTCACA 

> LysPr oAspAr gAspAspTr pGl uSer Gl y LeuAsnAI aMetAr gCysAI aLeuHl sLouGI uLysSer Vat AsnGl nSer 
Pmtl Ofalll 

CTACTGGAACTTCACAAACTGGCTACTGACAAGAATGATCCCCACGTGAGTATCAGAAACACGGGGTCAGTGGAGATGATTTGCCACAGGGCTTGGGACAGCTGACCAQT 
GATGACCTTGAAGTGTTTGACCGATGACTGTTCTTACTAGGGGTGCACrCATAGTCTTfGTGCCCCACrCACCTCTACTAAACGGTCTCCCGAACCCTCTCfflACTGGTCA^ 

^LeuLeuGiuLeuHisLysLeuAl aThr AspLysAsnAspPf oHI s 

BsRiBI BspMl Xcml Bmrl BstBl PcnH 

CTa?CCCATGTTCTCTTTCCTAGTTATGTGACTTCATTGA6ACGCATTACCTGAATGAC»GGTGAAATCCATTAAAGAACTGGGTGACCACXrr^ 
GACAGGGTACAAGAGARAGOATCAATACACTOAAGTAACTCTGCGTAATGGACTTACTCGTCCACTTTAGGTAATTTCTTGACCCACTGGTGCACTGGTTCAATG 

^ LeuGysAspPhel I eGl uThr Hi sTyr LeuAsnGI uGl nVal LysSer 1 1 eLysGI uLeuGlyAspHlsValThrAsnLeuArgLysMe tG 
Bann Msil BIpf Aatll Styl BstAF 

GAG CCCCT GA ATCTGG CAT GGOVGAATAT CTCT TTG ACAAG CA CACCCTGGGACACGGTGATG AGAGCTAAG CTGACGTCCCCAAGGCCATGTGACTTTACTCGCT 
CTCGGGGACTTAGACCGTACCGTCTTATAGAGAAACTGTTCGTGTGCGACCCTGTGCCACTACTCTCGATTCGACTGCAGGGSTTCCGGTACACTGAAATGACCGAGTGACTCC 

^ I yAi aPr oGI uSer Gl yMe tA I aGI uTyr LeuPheAspLysHi sThr LeuGI y Hi sGI yAspGl uSer • 
PpulOI 

Nsil Xapl 
EcoTZ2f Apoi Kpni 

SphI Acs! BanI 

CAGTGCATGCATCr CAGGCTGCCTTTATCTTTTCTAT AAGTTG CACCAAAACATCTGCTT AA AAGTT CTTT AATTT GT ACCATTTCTTCAAAT AAAG^ 
GTCACCTACCTACAGTCCGACGGAAATAGAAAAGATATTCAACGTGGTTTTGTAGACGAATTTTCAAGAAATTAAACATGGTAAAGAAGTTTATTTCrTAJ^ 

Sspl 

CTTGTTGTGATTGAGCATGAGCGCACCAGCTTCCCTTGCGTCGGCTATATAACCACACTGCAACGCCTGAAAGAATATTTATTAAACTCGTAGTTGGGCAAAGATAGTGAAAGA 
GAACAACACTAACTCCTACTCGCGTGSTCGAAGGGAACGCASCCGATATATTGGTGTGACGTTOCGGACTTTCTTATAAATAATTTGAGCATCAACCC^ 

OseOi Styl 

OrdI BspMI Ncol Xmnl SmU 

CAGGTGTGTTCAGACAGGACTAACCAGTCCT6GTTCTGAGTTACCTGCCAGACTGCCATGGGAACATATTCTTGAQTGTC 
CTCCACACAACTCTCTCCTGATTCGTCAGGACCAAGACTCAATGCACGGTCTGACGGTACCCrTGTATAAQAACTCACAG 
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A- 



Ferl 

ATC TGT CCA TG 6TGAGTCH:G6CCTGGCCTTTGGCGGGGCGGAAAGAG6GTGC6GCCI6GCCTCCCTTGG6CCACTTGGT6AGCTGGC6(^ 

ctgccgtggcaagtctgagcacctagcgctttgtggctcctgcatagaccaggcacgtcataacacc^^ 

aacttactctaaccacttctgaagcagcggcctctacatctctgcttatcacagagcctcacttgcattgaa^ 

ctgtaatcaccctoaccttgccaaggcatctagagtactgtacgtttttaatttttattttgcaccagttgttgct^ 

aacatacttgttggaaaaagcccacggttgggaaaaaaccattatcgtggaatacaaatacactgagtgcctaaaactgaa;^ 

Hpal 

caatqtatttgtgctaaaatacaatgccctcagttcttaaccaggtaatcagcagttggctgtctagctgaaaaccttgagaccttgtgttaacc^ 

ttttttttatttaacatgattgttgaaagagagaattgacctcccaatgtag6g(»crttagcaccccccctctcagacaaatagatat0g 

gcttaaagttttttctctgcactaatgtggagccatagaacccttgataaagccaagtcccaagtt tg ttttcccatc 

Exona 

TAGGGTGACAAACAGCCTTTACCACCATTGCATCTGCCTTGCTGTGGGGATCaATAACAAATACCCTTTCCACTTTCAG TCT TGT TAT TTT 

^ > G er Cys Tyr Phe 



GAC CGG GAT GAT GTG GCC CTG AAG AAC TTT GCC AAA TAG TTT CTC CAT CAA TCT CAT GAA GAG AGG GAA CAT 

^Asp Arg Asp Asp Val Ala Leu Lys Asn Phe Ala Lys Tyr Phe Leu His Gin Ser His GJu Glu Arg Glu His 
GCT GAG AAA CTG ATG AAG CTG CAG AAC CAG CGA GGT GGA CGA ATC TTC CTG CAG GAT ATC AAG GTAAGTAGACTA 

>Ala Glu Lys Leu Met Lys Leu Gin Asn Gin Arg Gly Gly Arg He Phe Leu Gin Asp tie Lys 

TGGGACTGCGTTAAATGAGCAGTlfNinmmniNmmNmmNNNlINMHHHNNNKNNimHl^ 
NKNHNHlimrNHmiNNmfimNNNNMKNNNNlimiimNNNNNKNNNNHNNNKNimHNNNKMNm 

Exon3 

TTTCTTTGATTCAG AAA CCT GAC CGT GAT GAC TGG GAG AGC GGG CTG AAT GCA ATG AGG TGT GCA CTG CAC TTG 

>Lys Pro Asp Arg Asp Asp Trp Glu Ser 61 y teu Asn Ala Met Arg Cys Ala Leu His Leu 

GAA AAG AGT GTG AAT CAG TCA CTA CTG GAA CTT CAC AAA CTG GCT ACT GAC AAG AAT GAT CCC CAC GTGAGTAT 

^Glu Lys Ser Val Asn Gin Ser Leu Leu Glu Leu His Lys Leu Ala Thr Asp Lys Asn Asp Pro His 

Exon4 

CAGAAACACGGGGTGAGTGGAGATGATTTGCCACAGGGCTTGGGAGAGCTGACCAGTAACCCTGTCCCATGTTCTCTTTCCTAG TTA TGT GAC 

>Leu Cys Asp 

TTC ATT GAG ACG CAT TAC CTG AAT GAG CAG GTG AAA TCC ATT AAA GAA CTG GGT GAC CAC GTG ACC AAC TTA 

► Phe tie Glu Thr His Tyr Leu Asn Gl u GI n Val Lys Ser lie Lys Glu Leu Gly Asp His Val Thr Asn Leu 

CGC AAG ATG GGA GCC CCT GAA TCT GGC ATG GCA GAA TAT CTC TTT GAC AAG CAC ACC CTG GGA CAC GGT GAT 

^Arg Lys Met Gly Ala Pro Glu Ser Gly Met Ala Glu Tyr Leu Phe Asp Lys His Thr Leu Gly His Gly Asp 
AaU((5419) 

GAG AGC TAA GCTGACGTCCCCAAGGCCATGTGACTTTACTGGTCACTGAGGCAGTGCATGCATGTCAGGCTGCCTTTATCTTTTCTATAAGTT 

► GIu Ser ■ 

, ^ 

GCACCAAAACATCTGCTTAAAAGTTCTTTAATTTGTACCATTTCTTC AAATAAAG AATTTTGGTACCCAGCTCTTGTTGTGATTG 

^ 



Fen 

ATC TGT CCA TO GTGAGTGCGGCCTGGCCTTTQ0CGGGGCGGAAAGAGGGTGCGGCCTGGCCTCCCTT6GGCCACTTGGTGAGCTGGCGGAGGG 



TGGGTTGG6GCGTG6CCTGCTGCGGGCTTCCCC6CCTTCCA6CGCCCTTCT66AAAATG6AGTTTGTCCGGGGTTCTTTCCAAAGGCAGGCAGCCCT 
GCCGTGGCAAGTCTGAGCACCTAGCGCTTTGTGGCTCCTGCATAGACCAGGCACGTCATAACACCCGTGTTTTGAAGCCTTAGG6CTGTACAACTGT 
CAGCCTCTCCAATCAACCCTGCAGTTAGGTGCATTTTCCTGCACTCTCGTCCCCTCCGGTCACATGGCCTGCAGGCTTCTCTGTTTGGGTGTACATC 
CAGCTCCAGTTCCTCTGACTATGGCGGGTCTGCTTGOTCATGGTGTGGAATGGCAGCCCTGGGGCTTGGTACAAAGAGGCTTATCTCTTGTGAACTT 
ACTCTAACCACTTCTGAAGCAGCGGCCTCTACATCTCTGCTTATCACAGAGCCTCACTTGCATTGAAACTTATCGCTAGGAATCTCCCCTTCTGTAA 
TCACCCTGACCTTGCCAAGGCATCTAGAGTACTGTACGTTTTTAATTTTTATTTTGCACCACTTGTTGCTTACTAACAGAAGTAGTAGGTAACATAC 
TTGTTGGAAAAAGCCCACGGTTGGGAAAAAACCATTATCGTGOAATACAAATACACTGAGTGCCTAAAACTGAAAATCAAAGCTTCTCCCAATGTAT 

Hpal 

TTGTGCTAAAATACAATGCCCTCAGTTCTTAACCAGGTAATCAGCAGTTGGCTGTCTAOCTGAAAACCTTGAGACCTTGTGTTAACCATTTTTTTTA 
TTTAACATGATTGTTGAAGGAGAGAATTGACCTCCCAATGTAGGGCACTTTAGCACCCCCCCTCTCAGACAAATAGATATGGCCTTGGCTTAAAGTT 
TTTTCTCTGCACTAATGTGGAGCCATAGAACCCTTGATAAAGCCAAGTCCCAAGTTTGTTTTCCCATCCTTACTTTAAAGGCCAAGTAGGGTGACAA 

Swa-2 Noll FN2 

ACAGCCTTTACCACCATTGCATCTGCCTTGCTGTG6GGATCAATAACAAATACCCTTTCCACTTTCAGCTGCTAGCGGCCGCGCTGACGT 

^ 

^ 
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FN1 Swa-1 NoU AaUl 

ACTTTCAGCTGCTAGCGGCCGCGCTGACGTCCCCAAGGCCATGTGACTTTACTGGTCACTGAGGCAGTGCATGCATGTCAGGCTGCCTTTO 



Fer4 

TCTATAAGTTGCACCAAAACATCTGCTTAAAAGTTCTTTAATTTGTACCATTTCTTCAAATAAAGAATTTTGGTACCCAGCT 



Ferl 

ATC TGT CCA TG GTGAGTGCGGCCTGGCCTTTGGCGGGGCGGAAAGAGGGTGCGGCCTGGCCTCCCTTGGGCCACTTGGTGAGCTGGCGGAGGG 
► 

TGGGTTGGGGCGTGGCCTGCTGCGGGCTTCCCCGCCTTCCAGCGCCCTTCTGGAAAATGGAGTTTGTCCGGG6TTCTTTCCAAAGGCAGGCAGCCCT 
GCC6T6GCAAGTCTGAGCACCTAGCGCTTTGTG6CTCCTGCATAGACCAGGCAC6TCATAACACCCGTGTTTTGAA6CCTTAGG6CTGTACAACTGT 
CAGCCTCTCCAATCAACCCTGCAGTTAGGTGCATTTTCCTGCACTCTCGTCCCCTCCGGTCACATGQCCTGCAGGCTTCTCTGTTTOGGTGT^ 
CAGCTCCA6TTCCTCTGACTATGGCGGGTCTGCTTGGTCATGGTGTGGAATGGCAGCCCT0GGGCTTG6TACAAAGAGGCTTATCTCTTGTGAACTT 
ACTCTAACCACTTCTGAA6CAGCGGCCTCTACATCTCTGCTTATCACAGAGCCTCACTT6CATTGAAACTTATCGCTAGGAATCTCCCCTTCTGTAA 
TCACCCTGACCrrTGCCAAGGCATCTAGAGTACTGTACGTTTTTAATTTTTATTTTGCACCAGTTGTTGCTTACTAACAGAAGTAOTAGGTAACATA^ 
TTGTTGGAAAAAGCCCACGGTT6GGAAAAAACCATTATCGTGGAATACAAATACACTGAGTGCCTAAAACTGAAAATCAAAGCTTCTCCCAATC 

Hpal 

TTGTGCTAAAATACAATGCCCTCAGTTCTTAACCAGGTAATCAGCAGTTGGCTGTCTAGCTGAAAACCTTGAGACCTTGTGTTAACCATTTTTXTTA 
TTTAACATGATTGTTGAAGGAGAGAATTGACC TCCCAATGT AGG6CACTTTAGCACC C C CC CTCT C AGACAAATAGATATGG C CTTGGCTTAAAGTT 
TTTTCTCTGCACTAATGTGGAGCCATAGAACCCTTGATAAAGCCAAGTCCCAAGTTTGTTTTCCCATCCTTACTTTAAAGGCCAAGTAGGGTGACAA 

Swa-1 Swal Swa-2 NaU AatU 
ACAGCCTTTACCACCATTGCATCTGCCTTGCTGTGGGGATCAATAACAAATACCCTTTCCATTTAAATCTGCTAGCGGCCGCTGACGTCCCCAAGGC 

^ 

^ 

CATGTGACTTTACTGGTCACTGAGGCAGTGCATGCATGTCAGGCTGCCTTTATCTTTTCTATAAGTTGCACCAAAACATCTGCTTAAAAGTTCTT^ 

Fer4 

ATTTGTACCATTTCTTC AAATAAA QAATTTTGGTACCCAGCTCTTGTTGTGATTG 

^ 
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Sacll 

BamHI (9469) CAP (9521) Kpnl (9533) 

GGATCCCGCTATAAAGTGCGGCCCGCTGGTCCCTACGCCAGACGTTCTCGCCCAGAGTCGCCGCGGTACCGGTGCTCG 

ACCCCTCCGACCCCCGTCCGGCCGCTTTGAGCCTGAGCCCTTTGCAACTTCGTCGCTCCGCCGCTCCAGCGTCGCCTC 

CGCGCCTCGCCCAGCCGCCATC ATG stgagtgcggcctggcctttggcggggcggaaagagggtgcggcctggcct 

► Met 

cccttgggccacttggtgagctggcggagggtgggttggggcgtggcctgctgcgggcttccccgccttccagcgccc 
ttctggaaaatggagtttgtccggggttctttccaaaggcaggcagccctgccgtggcaagtctgagcacctagcgct 
ttgtggctcctgcatagaccaggcacgtcataacacccgtgttttgaagccttagggctgtacaactgtcagcctctc 
caatcaaccctgcagttaggtgcattt:tcctgcactctcgtcccctccggtcacat:ggcctgcaggGt:tctct:gt:t:tg 
ggtgtacatccagctccagttcctctgactatggcgggtctgcttggtcatggtgtggaatggcagccctggggcttg 
gtacaaagaggcttatctcttgtgaacttactctaaccacttctgaagcagcggcctctacatctctgcttatcacag 
agcctcacttgcattgaaacttatcgctaggaatctccccttctgtaatcaccctgaccttgccaaggcatctagagt 
actgtacgtttttaatttttattttgcaccagttgttgcttactaacagaagtagtaggtaacatacttgttggaaaa 
agcccacggttgggaaaaaaccattatcgtggaatacaaatacactgagtgcctaaaactgaaaatcaaagcttctcc 
caatgtatttgtgctaaaatacaatgccctcagttcttaaccaggtaatcagcagttggctgtctagctgaaaacctt 
gagaccttgtgttaaccattttttttatttaacatgattgttgaaggagagaattgacctcccaatgtagggcacttt 
agcaccccccctctcagacaaatagatatggccttggcttaaagttttttctctgcactaatgtggagccataga'acc 
cttgataaagccaagtcccaagtttgttttcccatccttactttaaaggccaagtagggtgacaaacagcctttacca • 

Aatll (10785) 

Swal (10762) NotI (10776) 

ccattgcatctgccttgctgtggggatcaataacaaataccctttccatttAAATCTGCTAGCGGCCGCTGACGTCCC 

CAAGGCCATGTGACTTTACTGGTCACTGAGGCAGTGCATGCATGTCAGGCTGCCTTTATCTTTTCTATAAGTTGCACC 

Kpnl (10927) 

AAAACATCTGCTTAAAAGTTCTTTAATTTGTACCATTTCTTCAAATAAAGAATTTTGGTACCCAGCTCTTGTTGTGAT 
TGAGGATGAGCGCACCAGCTTCCCTTGCGTCGGCTATACTAACCACACTGCA 
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